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(57) Abstract 

PURPOSE: To recognize a subject consisting of the 
whole and its constitutional part such as a face at high 
speed and with high accuracy by recognizing the subject 
by inputting the feature part of the subject to an 
individual neural network. 

CONSTITUTION: The face of a person 1 to be inspected 
is image-picked up by directing a telecamera 2, and the 
image of the person 1 to be inspected is displayed on a 
display monitoring device 3. The person 1 to be 
inspected freezes the image and sends an Instruction for 
input definition by a key operation from a keyboard 
input device 4, and also, announcing display whether or 
not the posture of the face of the person 1 to be 
Inspected is corned is performed by, for example, a 
tone. Image data is extracted by a mask which extracts 
the whole and part of the subject and It is supplied to 
the neural network at every extracted image data, 
thereby, learning and recognition are performed. Also, 
parallel processing is performed by constructing each 
neural network with different processors. 
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Frequently asked questions 



Facial Templates: XP-002226854 

X. What is a facial template? 

While one can always apply face recognition to two facial images, there are advantages to 
comparing instead a facial image against a facial template: 

1. Speed of comparison. 

2. Storage size. 




Visionics offers templates optimized for speed and/or size, for both identification and 
verification. 



2. What are the types of facial templates used by Visionics for one to many 
searches (identification)? 

Visionics currently uses two types of templates for Identification: 

a. A small template, currently 88 bytes, is used for fast searching over 
the entire query restricted sub-database. This is known as a vector 
template. 

b. A large template of size 3.5Kb, is used for an intensive search over the 
top 5% (or less) ordered matches during l:n matching. This is known as 
a full template. 

It is possible to use the vector templates alone in some cases. 

3. What are the types of facial templates used by Visionics for one to one matching 
(verification)? 

a. cropped and scaled image of the person's face can be used as the 
template - known as a big canonical image. A compressed image of 100 
to 300 bytes for verification. 

b. A highly compressed image of 100 to 300 bytes, a compressed face. 

Using a big canonical Image for verification avoids the processing time required to create a 
vector template. In addition, these images typically compress to 1.8K in size using JPEG 
compression with compression factor 80, and are smaller than the full template. 

4. Can the identity of the person be obtained from the vector or full templates? 

No. The facial image cannot be reconstructed from these templates. 



5. What is a canonical or big canonical image? 




The original image. 



From an original image, rotate, scale, and crop the image so that the eyes appear in the 
same position and the image size is 80 pixels by 100 pixels. This generates a canonical 
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Canonical Image 

vi im^nf is similar except that more area around the face is kept so the image 
SAwS^bS* This tea more robust temp.ate for verification due to .nterna. 



algorithmic processes. 




Big Canonical Image 



6. Why is Visionte face compression better than any other standard form of 
compression? much better cornp ression of faces than 

ysrssssKS^^ « - designed for use with vision,cs face 

recognition technology. 

7. How small can facial image be compressed? 

Down to roughlylOO bytes. 
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Uncompressed Masked Image 
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BEST AVAILABLE COPY 



1000 bytes 300 bytes 100 bytes 

The compression API also lets you view the full image, without the mask. Note that the non- 
facial portion does not reconstruct as well, but since it does not contribute to the person's 
identity from the Visionics algorithm point of view, the data is not important. 



100 bytes unmasked 



8. How many templates or images should be stored for optimal performance for 
verification systems? 

For optimal performance during verification or surveillance, multiple varying images should 
be used to recognize a single person. For verification, three to nine images are typical for 
optimal performance. 



Face Recognition: 

1. What scientific method does Facelt® use for face recognition? 

Fundamental to any face recognition system is the way in which faces are coded. Facelt® 
uses Local Feature Analysis (LFA) to represent facial images in terms of local statistically 
derived building blocks. 

LFA is a mathematical technique that is based on the realization that all facial images can be 
synthesized from an irreducible set of building elements. These elements are derived from a 
representative ensemble of faces using sophisticated statistical techniques. 

They span multiple pixels (but are still local) and represent universal facial shapes, but are 
not exactly the commonly known facial features. In fact, there are many more facial building 
elements than there are facial parts. However, it turns out that synthesizing a given facial 
image, to a high degree of precision, requires only a small subset (12-40 characteristic 
elements) of the total available set. Identity is determined not only by which elements are 
characteristic, but also by the manner in which they are geometrically combined (i.e. their 
relative positions). 



2. How do changes in expression, such as smiling, frowning or blinking affect 
Facelt® face recognition? 

LFA has advantages over earlier approaches. Compared with "eigenfaces", LFA face 
recognition is relatively insensitive with respect to changes in expression, including blinking, 
frowning, and smiling. 



3. Is Facelt® face recognition sensitive to the growth of facial hair? 

No, LFA has enough redundancy and robustness to be able to compensate for mustache or 
beard growth. 



4. Is Facelt® face recognition sensitive to hairstyle? 

No, the hair is not used as a local feature. 



5. Does Facelt® use neural network technology? 

The» al/mrit+imc havP hpon "frainArf" nn human faroc t-n Hotormino the* rnrrorf cinoifiranro of 
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are clearly visible. 

invariant with respect to race or gender. 

9 . can an image captured with a dig*, camera be matched aga.nst a d.g.ta. video 

SS^'S-^ are appHed to make the matching reiativeiy invariant with respect to 
input type. 

10. What are the major causes of face recognition failure? 

d°Lack of resolution, in pixels, of the face. 

XI. can the face ^^^^SaSS^ 
product (eg CD-Fit, comPHOTOfit, Suspect ID) aga . sketch jmageSf 

stjsk^ wen wi " hi9h contrast 

"cartoon-like" images. 

12 . can Visionics' face recognition match accurate.y an image created with an 
aging product against an actual image? 

a. Our algorithms have been modified to better recognize infante and 
small children. 

the person is not recognized. 

On the average, accuracy is characterized in terms of two probabHit.es at a given threshold. 

1 False Acceptance Rate (FAR): The chance that an imposter wi„ be recognized 
(obtain a higher score) at a certain threshold. 

2 False Rejection Rate (FRR): The chance that the correct person w,.l not obta,n a 
score above a certain threshold. 

threshold. 
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14. How are the FAR and FRR and Equal Error Rates determined? 

, These numbers are determined by applying the face recognition algorithm to a large 
database of faces, where the correct matches have been pre-determined. 

The accuracy of the face recognition system is strongly dependent on the database used for 
analysis. In the results below, we use the publicly available portion of the FERET (provided 
by the U.S. Army Research Lab, Washington D.C., USA) database. This database is available 
upon request. 



15. Do the FAR and FRR values depend upon the database used in the analysis? 

Yes. The False Rejection Rate (FRR) is strongly sensitive to the database used to calculate 
the result. This is because the rejection of an individual may occur simply because the imaq 
quality of the database is poor. 

The False Acceptance Rate (FAR) is less sensitive to the image quality of the database 
However, databases still vary in recognition "difficulty". 

Therefore, both the FRR and FAR (to a lesser extent) are sensitive to the database used for 
analysis. 



16. Is there a standard database that can be used to judge all facial technologies 
on the same "playing field"? 

The closest to a standard database is the US Army FERET (FacE Recognition Test) database 
In the results below, we use the publicly available portion of the FERET database (provided 
by the U.S. Army Research Lab, Washington D.C, USA). This database is available upon 
request. 



17. What is the performance of Visionics face recognition on this database? 

The Equal Error Rate is .68% (less than one percent). 



Face Finding: 

1. What scientific method is used for face finding? 

Facelt® face finding uses a combination of geometrical queues and pattern matching to find 
heads and facial features. Visionics face finding can detect simultaneously the presence of 
multiple faces in an image or in video frames, and can accurately determine the position of 



2. Does the user have to supply clues for the face finding to work? 

No, the entire face finding process is fully automated, continuous and functions in real-time 
on standard off-the shelf processor. The user does not have to click on the image as a clue. 

3. Does the person have to be facing the camera for the face finding to work? 

No, the face finding technology can find a face as long as both eyes are clearly visible. 

4. Does the capturing program automatically measure the pose-offset angle of the 
face? 

No, the pose is not currently estimated. 

5. Can the distance from the face automatically be measured? 

Visionics technology returns the eye positions and subsequently the size of the face in the 
image. 

6. Does the face finding technology require the face to be of a certain size of in a 
certain position? 

No, Visionics technology finds faces at arbitrary scale (up to some resolution) and 
subsequently generates a re-scaled the face in a standard size for facial matching. 

7. How accurate is the face finding? 

The face finding technology finds faces and returns a score indicating the "goodness" of the 
face found (known as the alignment quality). This way, the small percentage (of order 1%) 
of aligned faces can be manually aligned to complete the template creation process on a 
large database. For example, to manually align a one million-person database at 10 
alignments (selecting the eye positions) per minute and eight-hour days, it would take 208 
man-days. On the other hand, it would only take 8 days to automatically align the faces at 
an alignment speed of one per second . 

8. Where is the face finding technology in the Visionics SDKs? 

In the ILocate COM object, the CVerify ActiveX control, and the CLocateXX C++ classes in 
the Facelt® Library. . 



Tracking: 

1. What is the scientific method used for tracking? 

Once the face is found, the tracking algorithm follows the face based on the person's 
geometrical facial characteristics and their flesh tone. 



I?ound and the tracking will still follow the person. 

3. Are there any special •»^«^5^K2^nit to n. provide color 



4. Where is 

In the ITracking 
the Facelt Library 



Liveness: 

that person. Visionics provides two Liveness tests. 

1 Single frame/image testing, which attempts to detect the characteristics of a 

photograph, such as rectangular borders. 
2. Multiple frame/video testing, which is a challenge response system. 

It asks the user to blink or smile, but not both at the same time . 

2. How long does J s^nfmu.tip.e frame "liveness- takes user 
JSS^WftST^ input, typically 2-3 seconds. 

3. Are there any special hardware requirements for "liveness"? 

No. 



Image Quality: 

^geSali^ 

^^SSK^ take'another picture if the gua.itv 
test fails. 

2. How long does it take to test for Image Quality? 

Less than l/10th of a second. 

3. What are the Image Quality tests? 

1. Quality of the face found in the image. 

2. Size and position of face. Reject faces if cut off the image or ,f too smal. 
3 overexposure. Verify that the facial image is not overexposed (too dark). 
4. Underexposure: Verify that the facial image is not too light. 



N= Nv / (1 + F * Nv / Ni ), where 
Nv = the number of searches per minute in vector mode, 
' Ni = the number of searches per minute in intensive mode, 
F = fraction of database searched in intensive mode. 

For example, Nv=45000K, Ni=15K, f = 0.005 (0.5%) yields an overall search speed of 
2,800,000 searches per minute . 



3. How much memory is required to execute the face recognition algorithm? 

1 . For matching two vectors, less than 2kilobytes of RAM of RAM are required. 

2. For matching an image to a full template, roughly 1.4 megabytes of RAM is 
required . 



4. How long does it take to create a template? 

On a 500 MHz Pentium III CPU, the template creation times are: 



1 . Vector Creation time: 1 second. 

2. Full template creation time; l/30th of a second. 3. (Big) Canonical image creation 
time: l/50th of a second. 



5. How much RAM is required to create a template? 



1 . For vector creation, roughly 2 megabytes of RAM are required. 

2. For full template creation, roughly 0.5 megabytes of RAM is required 

6. How fast is the face finding search speed? 

Face finding on a typical input image of 400x300 pixels takes between 0.3 and 1.0 seconds 
depending upon the size and quality of the face in the image. 

Note: This time can be reduced is the rough sizes of the faces in the images are known 
beforehand. Then the face finding algorithm does not need to spend time looking for faces of 
all sizes. 



7. How much memory is required to execute the face finding algorithm? 

Roughly 5 megabytes of RAM are required to analyze typical image for a face. 



Facelt DB/Sentinel/Surveillance: 

1. What is the search speed of the Facelt Search Engine in Facelt DB? 

The maximum search speed is roughly 15 million per minute per CPU (500 MHz). The speed 
is less than the maximum quoted speed of 47 million due to extra checks performed to 
maintain data integrity, and due to the usage of Microsoft Access and the image storaae 
database. 3 



2. What databases are used by Facelt DB/Sentinel/Surveillance? 

Facelt DB/Sentinel/Surveillance uses two databases systems. 



1 . A proprietary flat file system to store and quickly retrieve facial templates. 

2. Microsoft Access to store images and personal information. 



Microsoft Access can slow down dramatically during data insertions and deletions when the 
size of the image database grows significantly beyond 30 thousand. Since Facelt 
DB/Sentinel/Surveillance connects to Access via the Microsoft ODBC layer, connecting to 
other ODBC compatible databases, such as Microsoft SQL server can solve this problem. 
However, non-Access databases require a custom setup to work with 
DB/Sentinel/Surveillance. 



Database Storage: 

1. Do you have any specific storage requirements in regards to 1:N matching? 

Yes, for efficient 1:N matching, a facial template must be created and stored. 

Facial template creation involves finding the face in the image, and then processing the 

fnnnH far*» \nfn » huto arraw that ran hA later need fnr efficient faHal cearrhinn 



rpl , r#>SOU rces need to be allocated to perform 

search template . 

2. What does the template creation process entail? g image or ,„ 

™ » « «— = — - — — ■ ■ «• — • — "" h °'" 

human intervention. „ 

r * folate creation can be executed on the found face, or template creauon 

3. Do watermarked images affect ^ P^ma^^ of . 
No The correct concept and Face^Kchnoiogy as a filter system 
SSSSSM aSSffSSSffl^. and the combination of faca, .mages and 
search templates into ordered lists. 

Visions offers a PaceXt OB Enterpnse 

database system for templale Jtorage. ™ s N P ro ° u tform . However, the technology can be 
vSonics technology does not include a database . 

4 can your techno.ogy work wi* ^BMS^ODBMS, and ORDBMS7 

ORDBMS Examples: Informix 

Tne systems listed above include interfaces for imbedded SQL processing that would 
interface well with our technology. 

* • n^hhHp that can perform the face finding, template 

™ „ ^W. rtW «* M " mP,M ^ 

recognition are: 

a. process Internal to the database using imbedded SQL processing via a 
Datablade-like plug-in. 

b. Store only pointers to ^^SStiSSZ system. We 

r fc s e^^ 

The choice of system depends upon the existing infrastructure. 



solution. 

F0 r singie vendor database sing.e vendor p.atform infrastructures, a. may be the most cost 
effective solution . 

5 can you d»scuss further an Cementation strategy for a iarge-sca.e database 

Tien- Client web-based front end. The dents submit images and perform gueries inc.ud.ng 
face recognition through their browsers. 

to other data sources through ODBC. 
Active Sever Pages (ASP) technology. 



In addition this tier connects via an internal network messaging system, such as Microsoft 
Message Queue Server (MMQS) or IBM MQSeries, to the image server and the face 
recognition server. 

Tier3: Image and Person Info Server Stores digitized images and tags denoting gender, 
citizenship, criminal status (if any), etc - all persona! information which might be used to 
constrain a face recognition search or image lookup. Communicates to Tier2 via the network- 
messaging layer. 

Tier4: Face Recognition and Face Finding Server: Performs all template creation (probably 
using automated face finding) and ail face recognition queries. Communicates to Tier 3 and 2 
via the network- messaging layer. 

In the context of Tier 4, there are two solutions currently implemented by Vision ics: 

a. Process internal to the database using imbedded SQL processing via a 
"Datablade" like plug-in. The database software handles I/O and queuing. 
Visionics would provide the pJug-in. 

b. Process in the context of a layer such as MMQS, with queuing and I/O 
handled by the fully scalable Visionics Facelt DB Enterprise solution. 

In both cases, the same fundamental technology would be used . 

6. Volume: Is there a limit to the volume for l:N matching? 

No. 



7. What are the system requirements for a large-scale search system? 

In the context of an external facial system with a legacy database system storing images, a 
set of workstations for facial queries, and a backend system for facial processing, we 
calculate below the number of computers required to meet a specified search specification: 

Computer: 500 MHz CPU system with base RAM of 512MB, base and base disk space of 
10GB. 

One can replace a stand-alone computer with a processor in a dedicated multiple processing 
system if the system includes the RAM and disk as specified above to each CPU. 

Disk: This disk storage requirement of roughly 4K per overall facial templates per image will 
allow storage of 2.8 million individuals. 

RAM: The RAM storage requirement of less than 128 bytes per person translates into 320MB 
of RAM for fast template storage. 

CPU: The overall search speeds are roughly 2.8-million/per minute/per CPU. (see the 
performance section below). This translates into a formula for the number of computers: 

N_computers = 

(Total Population size)/(2.8 million* minutes/search), where minutes/search <= 1. 

For example, for searching through 20 million in one minute, the number of computers is 8. 

In addition to search and alignment engines, one addition computer may be required to act 
as a master controller. 



Image Input: 

1. Do you have any recommendations for digital cameras to use for the generation 
of a database of images for facial searching? 

Most personal computer based video capture devices are inadequate for high-quality facial 
recognition results. We recommend instead the use of a "Megapixel" digital camera with 
"flash on demand", or a high quality (300 DPI or above) scan of a good quality photograph. 

A good example is the KODAK DC260 Zoom Digital Camera, 1536 x 1024 resolution, 3X 
optical plus 2X digital zoom lens. It is capable of flash sync and manual exposure 
adjustment. The resolution is less important than the flash sync and the quality of the 
exposure. 

In general, there are a large number of adequate digital cameras for the task of photo 
imaging for face recognition tasks or identification card creation. The correct solution for a 
given application depends upon the price and availability at the time of the contract award. 
The price for such cameras has dropped by a factor of two roughly every two years and the 
resolution has also increased by a factor of four over the past year for low-end systems. We 
expect the price trend to continue, while the resolution will remain roughly in the mega-pixel 
range over the next two years . 



3. What is the full ^™°*^ a *R^to^* ***** of 9ra c y f° f '° r .■»- 

2Jff?^<2K^ be at maximum rou9hly 



80 for JPEG). resolution. The entire head, including 

KSWtffl^ 15 area of the ima9e ' and there 

Sly 100 pixels of data from eye to eye. 

f ^china are directly facing forward under controlled, balanced l.ghfng, 
wTe%%^ 

quality of the automatic face finder. 

other situates to he ^^^I^^Z^^^'^ * 
SS»S SS - decrees in any direction. 
Lar9 er reso.ut.on photos can he accept* lb.genera.ly do not increase accuracy to any 
measurable extent and take longer to process . 

iSSZ^SX^^ « . » grayscale ^ w.th *I 
compression, and 20 pixels from eye to eye. 

5 If the source an- target .mage formats are afferent, wiU the resu.ts he the same 
as matching using the same fo ™ a ^* comD onent of the input images. There will 

band -s robust to 8 bits in both cases. ■ 

Video Input: 

X. What video device ^an-ar-^^ 

We currently support in our Products <teMiaowy> rt VFW . 

video device to our algorithms.] 

2. Do you support USB cameras? _ oftware driver for their device, then that device 
SR5S Sr^SS!!^ softwa. does not connect direct, 
to hardware. 

3. What capture hardware and cameras do you recommend for use with your 
technology? 



USB Desktop Cameras 
The Kodak DVC323 
The Winnov USB camera 

Parallel Port Cameras 
The Vicam camera 



The Sanyo CCD desktop camera b a good e.amp.e of a high gua,ity,ow cost deskttp camera 
We afco .ike the Howard HA6800 with Zoom lens for high gua.ity video input for desktop 
systems 

Por demos, te,econferencing, or hk,h-end use, we like the more expensive Sony EVI-D30 pan 
tilt zoom auto-focus camera 



The Connectix Quickcam series of cameras is not recommended for use with face recognition 
j due to below average performance. 

Fixed Field Surveillance Cameras 

Pan-Tilt-Zoom Surveillance Cameras 

We have no recommendations for pan/tilt/zoom surveillance camera systems at this time. 

4. What sort of image enhancements might be required based on the video input? 

You should enable automatic gain control on the video camera. 

5. How can the quality of video be controlled to ensure optimal results' 

Avoid including a bright light source in the video field of view such as the sun or when 
indoors, a window in the background field of view. 

In general, avoid situations that will generate a photographic back lighting problem . 

6. What are the recommended video digitizer settings? 

Analog video input must be digitized (resolved into pixels) before it can be processed for face 
finding and face recognition. The default digitizer setting for most desktop systems is 
160x120 pixels. 

This resolution is not recommended for use with Visionics technology. Instead we 
recommend: 

For desktop verification: RGB888 320x240 pixels. 
For video surveillance: RGB888 640x480 pixels. 

The setting can be adjusted via the manufacturer's digitizer options dialog box, which is a 
standard component of VFW software drivers . 

recognition? 6 deflnlt '° n of best qualltv vldeo m P ut for face finding and face 

The video digitizer settings are RGB888 640x480. 

25L fe S^ nd neck are cIearly visible in the video field ' and roughly the face takes uo one- 
third of the video field. Under no conditions shou.d the head be cut off in ™X 
will compromise the abilities of the automatic face finder. 

^^tonn^a* S ° ^ t0 Camera 35 t0 CaU$e a " fish - e y en effect wher * the nose 

The best faces for matching are directly facing forward under controlled, balanced liqhtinq 
with eyes open and a neutral expression like a mug shot photograph. 

Other situations to be avoided include glare on eyeglasses that obscure the eyes, sunglasses 
closed eyes mouths open during speech, strong smiles with exposed teeth, and variations in 
pose (left/right) or tilt (up/down) beyond 15 degrees in any direction . var,a t.ons in 

8. What are the minimum video input specifications required for the Visionics 
technology to maintain good effectiveness? 

There should be minimum 20 pixels between the eyes. The person's pose should be within 
35 degrees of frontal, within 15 if possible. 
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